[Analysis] Clamp SelectOp divisibility when condConstancy reduces output contiguity#10359
Conversation
…put contiguity In SelectOpAxisInfoVisitor's tensor-cond branch, the call to getDivisibilityFromContiguity sees only the lhs/rhs contiguities and can overestimate divisibility when condConstancy further reduces the output contiguity below either input's contiguity. Example: lhs c=8 d=8, rhs c=8 d=16, condConstancy=1. Output contiguity collapses to 1 (every position is a leader), but the helper returns gcd(8, 16) = 8 because c_lhs == c_rhs. The output value at position 1 may be 17, not divisible by 8. This is latent on the current pow2 lattice (gcd == min, and codegen vec_width is capped by contiguity, which is computed correctly), but it is a soundness regression introduced by triton-lang#7781. Fix is a conditional GCD with the output contiguity at the SelectOp callsite, preserving the existing semantics when condConstancy does not bind. Fixes triton-lang#10067.
| // divisible only by gcd(d_src, p) <= gcd(d_src, outContig). Clamp | ||
| // divisibility by output contiguity to keep this sound. | ||
| // getDivisibilityFromContiguity itself does not see condConstancy. | ||
| int64_t div = getDivisibilityFromContiguity(lhsInfo, rhsInfo, d); |
There was a problem hiding this comment.
simplify this as the following?
int64_t div =
gcd(getDivisibilityFromContiguity(lhsInfo, rhsInfo, d),
contiguity.back());
There was a problem hiding this comment.
Done in 2fb17bd. Lit suite + pre-commit clean.
|
Are the failrues related?
|
No, I debugged and there's an IMA in triton_kernels. Trying to isolate the commit that caused this problem now |
Fixes #10067.
In
SelectOpAxisInfoVisitor's tensor-cond branch, the call togetDivisibilityFromContiguityonly sees the lhs/rhs contiguities and can overestimate divisibility whencondConstancyfurther reduces the output contiguity below either input's contiguity.Concrete example
[8, 9, 10, 11, 12, 13, 14, 15](c=8, d=8)[16, 17, 18, 19, 20, 21, 22, 23](c=8, d=16)condConstancy = 1Output contiguity collapses to
gcd(8, 8, 1) = 1(every position is a leader). ButgetDivisibilityFromContiguityseesc_lhs == c_rhs == 8and returnsgcd(8, 16) = 8— without accounting forcondConstancy. The output value at position 1 may be 17, which is not divisible by 8.Why this didn't blow up
On the current pow2 lattice,
gcd == minon powers of 2. Codegen'svec_width = min(c, d/e, ...)is bounded by contiguity, and contiguity is computed correctly. So the overestimated divisibility is never the binding constraint onvec_width. This is a latent soundness regression introduced by #7781 — the pre-#7781 code clampeddivisibilityagainst the just-computed output contiguity, and that was sound independent of the pow2 invariant.Fix
A conditional GCD with the output contiguity at the SelectOp callsite. The clamp fires only when
condConstancy(or another shrinking factor) reduces the output contiguity strictly below at least one input's contiguity — i.e. it preserves the existing semantics whencondConstancyis non-binding.The helper
getDivisibilityFromContiguityis left unchanged (its other callers —MaxMinOpAxisInfoVisitorandAxisInfo::join— don't have acondConstancy-equivalent, so the same gap doesn't exist there).Test
Added
select_cond_constancy_clamps_divisibilityintest/Analysis/test-alignment.mlir. The test fails before the fix (divisibility = [8]) and passes after (divisibility = [1]).New contributor declaration
I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD.Select one of the following.
/testforlittests/unittestfor C++ tests/python/testfor end-to-end testsFILL THIS IN.Select one of the following.
littests.littests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)